Special reproduction control information describing method, special reproduction control information creating apparatus and method therefor, and video reproduction apparatus and method therefor

ABSTRACT

A special reproduction control information comprises plurality of items of frame information. Each of the items of frame information comprises video location information indicating the location of video data to be reproduced in a special reproduction and display time control information indicating the time for displaying the video data.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is based upon and claims the benefit of priorityfrom the prior Japanese Patent Application No. 2000-200220, filed Jun.30, 2000, the entire contents of which are incorporated herein byreference.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates to a special reproduction controlinformation describing method for describing special reproductioncontrol information used to perform special reproduction for targetvideo contents, a special reproduction control information creatingmethod for creating the special reproduction control information and aspecial reproduction control information creating apparatus and a videoreproduction apparatus and method for performing special reproduction byusing the special reproduction control information.

[0004] 2. Description of the Related Art

[0005] In recent years, a motion picture is compressed as a digitalvideo and is stored in disk media represented by a DVD, and a HDD sothat a video can be reproduced at random. A video can be reproducedhalfway from a desired timing in the state of virtually no waiting time.As in conventional tape media, disk media can be fast reproduced at twoto four times speed or can be reversely reproduced.

[0006] However, there is a problem in that the length of a video can bevery long in many cases, and time cannot be sufficiently compressed toview the whole contents of the video even at two to four times fastreproduction. When the rate of the fast reproduction is increased, thescene change is enlarged to a degree exceeding the ability to view it,so that grasping the contents is difficult, and even portions which arenot needed are also reproduced so that waste is caused.

BRIEF SUMMARY OF THE INVENTION

[0007] Accordingly, the present invention is directed to method andapparatus that substantially obviates one or more of the problems due tolimitations and disadvantages of the related art.

[0008] According to one aspect of the present invention, a method ofdescribing frame information comprises:

[0009] describing, for a frame extracted from a plurality of frames in asource video data, first information specifying a location of theextracted frame in the source video data; and

[0010] describing, for the extracted frame, second information relatingto a display time of the extracted frame.

[0011] According to another aspect of the present invention, an articleof manufacture comprising a computer usable medium storing frameinformation, the frame information comprises:

[0012] first information, described for a frame extracted from aplurality of frames, specifying a location of the extracted frame in thesource video data; and

[0013] second information, described for the extracted frame, relatingto a display time of the extracted frame.

[0014] According to another aspect of the present invention, anapparatus for creating frame information comprises:

[0015] a unit configured to extract a frame from a plurality of framesin a source video data;

[0016] a unit configured to create the frame information including firstinformation specifying a location of the extracted frame and secondinformation relating to a display time of the extracted frame; and

[0017] a unit configured to link the extracted frame to the frameinformation.

[0018] According to another aspect of the present invention, a method ofcreating frame information comprises:

[0019] extracting a frame from a plurality of frames in a source videodata; and

[0020] creating the frame information including first informationspecifying a location of the extracted frame in the source video dataand second information relating to a display time of the extractedframe.

[0021] According to another aspect of the present invention, anapparatus for performing a special reproduction comprises:

[0022] a unit configured to refer to frame information described for aframe extracted from a plurality of frames in a source video data andincluding first information specifying a location of the extracted framein the source video data and second information relating to a displaytime of the extracted frame;

[0023] a unit configured to obtain the video data corresponding to theextracted frame based on the first information;

[0024] a unit configured to determine the display time of the extractedframe based on the second information; and

[0025] a unit configured to display the obtained video data for thedetermined display time.

[0026] According to another aspect of the present invention, an articleof manufacture comprising a method of performing a special reproductioncomprises:

[0027] referring to frame information described for a frame extractedfrom a plurality of frames in a source video data and including firstinformation specifying a location of the extracted frame and secondinformation relating to a display time of the extracted frame;

[0028] obtaining the video data corresponding to the extracted framebased on the first information;

[0029] determining the display time of the extracted frame based on thesecond information; and

[0030] displaying the obtained video data for the determined displaytime.

[0031] According to another aspect of the present invention, an articleof manufacture comprising an article of manufacture comprising acomputer usable medium having computer readable program code meansembodied therein, the computer readable program code means performing aspecial reproduction, the computer readable program code meanscomprises:

[0032] computer readable program code means for causing a computer torefer to frame information described for a frame extracted from aplurality of frames in a source video data and including firstinformation specifying a location of the extracted frame and secondinformation relating to a display time of the extracted frame;

[0033] computer readable program code means for causing a computer toobtain the video data corresponding to the extracted frame based on thefirst information;

[0034] computer readable program code means for causing a computer todetermine the display time of the extracted frame based on the secondinformation; and

[0035] computer readable program code means for causing a computer todisplay the obtained video data for the determined display time.

[0036] According to another aspect of the present invention, an articleof manufacture comprising a method of describing sound information, themethod comprises:

[0037] describing, for a frame extracted from a plurality of soundframes in a source sound data, first information specifying a locationof the extracted frame in the source sound data; and

[0038] describing, for the extracted frame, second information relatingto a reproduction start time and reproduction time of the sound data ofthe extracted frame.

[0039] According to another aspect of the present invention, an articleof manufacture comprising an article of manufacture comprising acomputer usable medium storing frame information, the frame informationcomprises:

[0040] first information, described for a frame extracted from aplurality of sound frames, specifying a location of the extracted framein the source sound data; and

[0041] second information, described for the extracted frame, relatingto a reproduction start time and reproduction time of the sound data ofthe extracted frame.

[0042] According to another aspect of the present invention, an articleof manufacture comprising a method of describing text information, themethod comprises:

[0043] describing, for a frame extracted from a plurality of text framesin a source text data, first information specifying a location of theextracted frame in the source text data; and

[0044] describing, for the extracted frame, second information relatingto a display start time and display time of the text data of theextracted frame.

[0045] According to another aspect of the present invention, an articleof manufacture comprising an article of manufacture comprising acomputer usable medium storing frame information, the frame informationcomprises:

[0046] first information, described for a frame extracted from aplurality of text frames in a source text data, specifying a location ofthe extracted frame in the source text data; and

[0047] second information, described for the extracted frame, relatingto a display start time and display time of the text data of theextracted frame.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING

[0048]FIG. 1 is a view showing an example of a data structure of specialreproduction control information according to one embodiment of thepresent invention;

[0049]FIG. 2 is a view showing an example of a structure of a specialreproduction control information creating apparatus;

[0050]FIG. 3 is a view showing an another example of structure of thespecial reproduction control information creating apparatus;

[0051]FIG. 4 is a flowchart showing one example for the apparatus shownin FIG. 2;

[0052]FIG. 5 is a flowchart showing one example for the apparatus shownin FIG. 3;

[0053]FIG. 6 is a view showing an example of a structure of a videoreproduction apparatus;

[0054]FIG. 7 is a flowchart showing one example for the apparatus shownin FIG. 6;

[0055]FIG. 8 is a view showing an example of a data structure of specialreproduction control information;

[0056]FIG. 9 is a view explaining video location information forreferring to an original video frame;

[0057]FIG. 10 is a view explaining video location information forreferring to a image data file;

[0058]FIG. 11 is a view explaining a method for extracting video data inaccordance with a motion of a screen;

[0059]FIG. 12 is a view explaining video location information forreferring to the original video frame;

[0060]FIG. 13 is a view for explaining video location information forreferring to the image data file;

[0061]FIG. 14 is a view showing an example of a data structure ofspecial reproduction control information in which plural original videoframes are referred to;

[0062]FIG. 15 is a view explaining a relation between the video locationinformation and the original plural video frames;

[0063]FIG. 16 is a view explaining a relation between the image datafile and the original plural video frames;

[0064]FIG. 17 is a view explaining video location information forreferring to the original video frame;

[0065]FIG. 18 is a view for explaining video location information forreferring to the image data file;

[0066]FIG. 19 is a flow chart for explaining a special reproduction;

[0067]FIG. 20 is a view for explaining a method for extracting videodata in accordance with a motion of a screen;

[0068]FIG. 21 is a view for explaining a method for extracting videodata in accordance with a motion of a screen;

[0069]FIG. 22 is a flowchart showing one example for calculating displaytime at which a scene change quantity becomes constant as much aspossible;

[0070]FIG. 23 is a flowchart showing one example for calculating a scenechange quantity of the whole frame from an MPEG video;

[0071]FIG. 24 is a view for explaining a method for calculating a scenechange quantity of a video from an MPEG stream;

[0072]FIG. 25 is a view for explaining a processing procedure forcalculating display time at which a scene change quantity becomesconstant as much as possible;

[0073]FIG. 26 is a flowchart showing one example of the processingprocedure for conducting special reproduction on the basis of specialreproduction control information;

[0074]FIG. 27 is a flowchart showing one example for conducting specialreproduction on the basis of a display cycle;

[0075]FIG. 28 is a view for explaining a relationship between acalculated display time and the display cycle;

[0076]FIG. 29 is a view for explaining a relationship between acalculated display time and the display cycle;

[0077]FIG. 30 is a view showing another example of a data structure ofspecial reproduction control information;

[0078]FIG. 31 is a view explaining a method for extracting video data inaccordance with a motion of a screen;

[0079]FIG. 32 is a view explaining video location information forreferring to the original video frame;

[0080]FIG. 33 is a view showing another example of a data structure ofspecial reproduction control information;

[0081]FIG. 34 is a view showing another example of a data structure ofspecial reproduction control information;

[0082]FIG. 35 is a view showing another example of a data structure ofspecial reproduction control information;

[0083]FIG. 36 is a flowchart showing one example for calculating displaytime from the importance;

[0084]FIG. 37 is a view for explaining a method for calculating displaytime from the importance;

[0085]FIG. 38 is a flowchart showing one example for calculatingimportance data on the basis of the idea that a scene having a largesound level is important;

[0086]FIG. 39 is a flowchart showing one example for calculatingimportance data on the basis of the idea that a scene on which manyimportant words appear with sound recognition is important, or aprocessing procedure for calculating importance data on the basis of theidea that the scene in which the number of words talked per time is manyis important;

[0087]FIG. 40 is a flowchart showing one example for calculatingimportance data on the basis of the idea that a scene on which manyimportant words appear with telop recognition is important, or aprocessing procedure for calculating importance data on the basis of theidea that the scene in which the number of words included in the telopwhich appears per time is large with telop recognition is important;

[0088]FIG. 41 is a flowchart showing one example for calculatingimportance data on the basis of the idea that the scene in which a largecharacter appears as a telop is important;

[0089]FIG. 42 is a flowchart showing one example for calculatingimportance data on the basis of the idea that the scene in which manyhuman faces appear is important or a processing for calculatingimportance data on the basis of the idea that the scene where humanfaces are displayed in an enlarged manner is important;

[0090]FIG. 43 is a flowchart showing one example for calculatingimportance data on the basis of the idea that the scene in which videossimilar to the registered important scene appear is important;

[0091]FIG. 44 is a view showing another example of a data structure ofspecial reproduction control information;

[0092]FIG. 45 is a view showing another example of a data structure ofspecial reproduction control information;

[0093]FIG. 46 is a view showing another example of a data structure ofspecial reproduction control information;

[0094]FIG. 47 is a view for explaining a relationship betweeninformation as to whether the scene is to be reproduced or not and thereproduced video;

[0095]FIG. 48 is a flowchart showing one example of a processingprocedure of special reproduction including reproduction andnon-reproduction judgment;

[0096]FIG. 49 is a view showing one example of a data structure whensound information or text information is added;

[0097]FIG. 50 is a view showing one example of a data structure fordescribing only sound information separately from frame information;

[0098]FIG. 51 is a view showing one example of a data structure fordescribing only text information separately from frame information;

[0099]FIG. 52 is a view for explaining a synchronization of areproduction of each of media;

[0100]FIG. 53 is a flowchart showing one example of a determinationprocedure of a sound reproduction start time and a sound reproductiontime in a video frame section;

[0101]FIG. 54 is a flowchart showing one example for preparingreproduction sound data and correcting video frame display time;

[0102]FIG. 55 is a flowchart showing one example of a processingprocedure of obtaining text information with telop recognition;

[0103]FIG. 56 is a flowchart showing one example of a processingprocedure of obtaining text information with sound recognition;

[0104]FIG. 57 is a flowchart showing one example of a processingprocedure of preparing text information;

[0105]FIGS. 58A and 58B are views for explaining a method of displayingtext information;

[0106]FIG. 59 is a view showing one example of a data structure ofspecial reproduction control information for sound information;

[0107]FIG. 60 is a view showing another example of a data structure ofspecial reproduction control information for sound information;

[0108]FIG. 61 is a view explaining a summary reproduction of thesound/music data; and

[0109]FIG. 62 is a view explaining another summary reproduction of thesound/music data.

DETAILED DESCRIPTION OF THE INVENTION

[0110] Preferred embodiments of the present invention will now bedescribed with reference to the accompanying drawings.

[0111] The embodiments relate to a reproduction of video contents havingvideo data using special reproduction control information. The videodata comprises a set of video frames (video frame group) constituting amotion picture.

[0112] The special reproduction control information is created from thevideo data by a special reproduction control information creatingapparatus and attached to the video data. The special reproduction isreproduction by a method other than a normal reproduction. The specialreproduction includes a double speed reproduction (or a high speedreproduction), jump reproduction (or jump continuous reproduction), anda trick reproduction. The trick reproduction includes a substitutedreproduction, an overlapped reproduction, a slow reproduction and thelike. The special reproduction control information is referred to whenthe special reproduction is executed in the video reproductionapparatus.

[0113]FIG. 1 shows one example of a basic data structure of the specialreproduction control information.

[0114] In this data structure, plural items of frame information “i”(i=1 to N) are described in correspondence to the frame appearance orderin the video data. Each frame information 100 includes a set of videolocation information 101 and display time control information 102. Thevideo location information 101 indicates a location of video data to bedisplayed at the time of special reproduction. The video data to bedisplay may be one frame, a group of a plurality of continuous frames,or a group formed of a part of a plurality of continuous frames. Thedisplay time control information 102 forms the basis of calculating thedisplay time of the video data.

[0115] In FIG. 1, the frame information “i” is arranged in an order ofthe appearance of frames in the video data. When information indicatingan order of frame information is described in the frame information “i”,the frame information “i” may be arranged and described in any order.

[0116] The reproduction rate information 103 attached to a plurality ofitems of frame information “i” shows the reproduction speed rate and isused for designating the reproduction at a speed several times higherthan that corresponding to the display time as described by the displaytime control information 102. However, the reproduction rate information103 is not essential information. The information 103 may constantly beattached, not constantly be attached, or selectively attached. Even whenthe reproduction rate information 103 is attached, the information maynot be used at the time of special reproduction. The reproduction rateinformation may constantly be used, may not constantly used, or isselectively used.

[0117] In FIG. 1, it is possible to further add other controlinformation to the frame information group together with thereproduction rate information or in place of the reproduction rateinformation. In FIG. 1, it is also possible to add different controlinformation to each frame information “i”. In these cases, eachinformation included in the special reproduction control information maybe all used on the side of the video reproduction device, or a part ofthe information may be used.

[0118]FIG. 2 shows an example of a structure of an apparatus forcreating special reproduction control information.

[0119] This special reproduction control information creating devicecomprises a video data storage unit 2, a video data processing unit 1including a video location information processing unit 11 and a displaytime control information processing unit 12, and a special reproductioncontrol information storage unit 3. In detail, as will be describedlater, since the video data (encoded data) is decoded to be video databefore displaying, it takes a processing time required for decoding thevideo data from the display instruction is issued until the video isdisplayed. In order to extracted this processing time, it is proposed todecode the video data beforehand and store an image data file.

[0120] If an image data file is used (the image data file may beconstantly used, or the image data file is selectively used), an imagedata file creating unit 13 (in the video data processing unit 1) and animage data file storage unit 14 are further provided as shown in FIG. 3.If other control information is added which is determined on the basisof the video data to the special reproduction control information, thecorresponding function is appropriately added to the inside of the videodata processing unit 1.

[0121] If an operation by a user is intervened in this processing, a GUIis used for displaying, for example, video data in frame units, andproviding a function of receiving an input of an instruction by the userthough omitted in FIGS. 2 and 3.

[0122] In FIGS. 2 and 3, a CPU, a memory, an external storage device,and a network communication device is provided when needed, and softwaresuch as driver software used when needed and an OS are not shown.

[0123] The video data storage unit 2 stores video data which becomes antarget of processing for creating special reproduction controlinformation (or special reproduction control information and image datafiles).

[0124] The special reproduction control information storage unit 3stores special reproduction control information that has been created.

[0125] The image data file storage unit 4 stores image data files thathave been created.

[0126] The storage units 2, 3, and 4 comprise, for example, a hard disk,an optical disk and a semiconductor memory. The storage units 2, 3, and4 may comprise separate storage devices. All or part of the storageunits may comprise the same storage device.

[0127] The video data processing unit 1 creates the special reproductioncontrol information (or the special reproduction control information andimage data file) on the basis of the video data which becomes an targetof processing.

[0128] The video location information processing unit 11 determines(extracts) a video frame (group) which should be displayed or which canbe displayed at the time of special reproduction to conduct processingof preparing the video location information 101 which should bedescribed in each frame information “i”.

[0129] The display time control information processing unit 102 conductsa processing for preparing the display time control information 102associated with the display time of the video frame (group) associatedwith each frame information “i”.

[0130] The image data file creating unit 13 conducts a processing forpreparing an image data file from the video data.

[0131] The special reproduction control information creating apparatuscan be realized, for example, in a form of conducting software on acomputer. The apparatus may be realized as a dedicated apparatus forcreating the special reproduction control information.

[0132]FIG. 4 shows an example of a processing procedure in a case of astructure of FIG. 2. The video data is read (step S11), video locationinformation 101 is created (step S12), display time control information102 is created (step S13), and special reproduction control informationis stored (step S14). The procedure of FIG. 4 may be consecutivelyconducted for each frame information, and each processing may beconducted in batches. The other procedures can also be conducted.

[0133]FIG. 5 shows an example of a processing procedure in a case of thestructure of FIG. 3. A procedure for preparing and storing image datafiles is added to a procedure of FIG. 4 (step S22). The image data fileis created and/or stored together with the preparation of the videolocation information 101. It is also possible to create the videolocation information 101 at a timing different from that of FIG. 4. Inthe same manner as the case of FIG. 4, the procedure of FIG. 5 may beconducted for each frame information, or may be conducted in batches.The other procedures can also be conducted.

[0134]FIG. 6 shows an example of a video reproduction apparatus.

[0135] This video reproduction apparatus comprises a controller 21, anormal reproduction processing unit 22, a special reproductionprocessing unit 23, a display device 24, and a contents storage unit 25.If contents are handled wherein audio such as sound or the like is addedto the video data, it is preferable to provide a sound output section.If contents are handled wherein text data is added to the video data,the text may be displayed on the display device 24, or may be outputfrom the sound output section. If contents are handled wherein a programis attached, an attached program execution section may be provided.

[0136] The contents storage unit 25 stores at least video data andspecial reproduction control information. In detail, as will bedescribed later, in the case where the image data file is used, theimage data file is further stored. The sound data, the text data, andthe attached program are further stored in some cases.

[0137] The contents storage unit 25 may be arranged at one location in aconcentrated manner, or may be arranged in a distributed manner. Thepoint is that the contents can be accessed with the normal reproductionprocessing unit 22 and special reproduction processing unit 23. Thevideo data, special reproduction control information, image data files,sound data, text data, and attached program may be stored in separatemedia or may be stored in the same medium. As the medium, for example,DVD is used. These may be data which are transmitted via a network.

[0138] The controller 21 basically receives an instruction such as anormal reproduction and a special reproduction with respect to thecontents from the user via a user interface such as a GUI or the like.The controller 21 controls for giving to the corresponding processingunit an instruction of reproduction by means of a method designated withrespect to the designated contents.

[0139] The normal reproduction processing unit 22 is used for the normalreproduction of the designated contents.

[0140] The special reproduction processing unit 23 is used for thespecial reproduction (for example, a high speed reproduction, jumpreproduction, trick reproduction, or the like) of the designatedcontents by referring to the special reproduction control information.

[0141] The display device 24 is used for displaying a video.

[0142] The video reproduction apparatus can be realized by computersoftware. It may partially be realized by hardware (for example, decodeboard (MPEG-2 decoder) or the like). The video reproduction apparatusmay be realized as a dedicated device for video reproduction.

[0143]FIG. 7 shows one example of a reproduction processing procedure ofthe video reproduction apparatus of FIG. 6. At step S31, it isdetermined whether user requests a normal reproduction or a specialreproduction. When a normal reproduction is requested, the designatedvideo data is read at step S32 and a normal reproduction is conducted atstep S33. When a special reproduction is requested from the user, thespecial reproduction control information corresponding to the designatedvideo data is read at step S34, the location of the video data to bedisplayed is specified and the display time is determined at step S35.The corresponding frame (group) is read from the video data (or theimage data file) at step S36 to conduct special reproduction of thedesignated contents at step S37. The location of the video data can bespecified and the display time can be determined at a timing differentfrom that in FIG. 7. The procedure of the special reproduction of FIG. 7may be consecutively conducted for each frame information, or eachprocessing may be conducted in batches. Other procedures can beconducted. For example, in the case of the reproduction method in whichthe display time of each frame is equally set to a constant value, it isnot necessary to determine the display time.

[0144] Both in the normal reproduction and in the special reproduction,the user may demand various designations (for example, the start pointof the reproduction or the end point of the reproduction in thecontents, a reproduction speed in the high speed reproduction, andreproduction time in the high speed reproduction, and other method, suchas special reproduction or the like).

[0145] Next, an algorithm for creating the frame information of thespecial reproduction control information and an algorithm forcalculating the display time of the special reproduction will beschematically explained.

[0146] At the time of creating the frame information, the frameinformation to be used at the time of the special reproduction isdetermined from the video data, the video location information iscreated, and the display time control information is created.

[0147] The frame is determined by such methods as; 1) a method forcalculating the video frame on the basis of some characteristic quantitywith respect to the video data (for example, a method for extracting thevideo frames such that the total of characteristic quantity (forexample, the scene change quantity) between the extracted frames becomesconstant and a method for extracting the video frames such that thetotal of importance between the extracted frames becomes constant), and(2) a method for calculating the video frame on a fixed standard (forexample, a method for extracting frames at random, and a method forextracting frames at an equal interval). The scene change quantity isalso called as a frame activity value.

[0148] In the creation of the display time control information 121,there are available; (i) a method for calculating an absolute value or arelative value of the display time or a display frame number, (ii) amethod for calculating reference information which is a base of thedisplay time and a display frame number (for example, the informationdesignated by the user, characters in the video, sound synchronized withvideo, and persons in the video, and the importance obtained on thebasis of the specific pattern in the video), (iii) a method fordescribing both (i) and (ii).

[0149] It is possible to appropriately combine (1) or (2) and (i), (ii)or (iii). Needless to say, other methods can be possible. One specificcombination out of such methods can be used, and a plurality ofcombinations of these methods may be used and can be appropriatelyselected.

[0150] In a specific case, at the same time with the determination ofthe frame at the method (1), a relative value of the display time andthe number of display frames are determined. If this method isconstantly used, it is possible to omit the display time controlinformation processing unit 102.

[0151] At the time of the special reproduction, it is assumed that thespecial reproduction is conducted by referring to the display timecontrol information 121 of (i), (ii) or (iii) included in the frameinformation. However, the described value may be followed or thedescribed value may be corrected and used. In addition to the describedvalue and the corrected value thereof, independently created otherinformation, and information input from the user may be used.Alternatively, only the independently created other information and theinformation input from the user may be used. A plurality of methods outof these methods are enabled and can be appropriately selected.

[0152] Next, an outline of the special reproduction will be explained.

[0153] A double speed reproduction (or a high speed reproduction)carries out reproduction in a time shorter than the time required forthe normal reproduction of the original contents by reproducing a partof the frames out of the whole frames constituting the video datacontents. For example, the frames indicated by the frame information aredisplayed for each display time indicated by the display time controlinformation 121, in the order of time sequence. Based on a request fromthe user, such as a speed designation request for designating at whattimes speed of the normal reproduction the original contents arereproduced (in what factor of the time required for the normalreproduction the original contents are reproduced) and a timedesignation request for designating how much time is taken forreproducing the contents, the display time of each frame (group) isdetermined to satisfy the reproduction request. The high speedreproduction is called a summarized reproduction.

[0154] A jump reproduction (or a jump continuous reproduction) is suchthat a part of the frame shown in the frame information is subjected tonon-reproduction, for example, on the basis of thereproduction/non-reproduction information described later in the highspeed reproduction. The high speed reproduction is conducted withrespect to the frame excluding the frame which is subjected tonon-reproduction out of the frames shown in shown in the frameinformation.

[0155] A trick reproduction excludes from the reproduction except forthe normal reproduction the high speed reproduction and the jumpreproduction. For example, at the time of reproducing the frame shown inthe frame information, there can be considered various forms such as asubstituted reproduction for reproducing a certain portion by replacingthe order of time sequence, an overlapped reproduction for reproducing acertain portion repeatedly a plurality of times at the time ofreproducing the frame shown in frame information, a variable speedreproduction in which at the time of reproducing the frame shown in theframe information, a certain portion is reproduced at a speed lower thanthe reproduction of another portion (including the case in which theportion is reproduced at the speed of normal reproduction, or the casein which the portion is reproduced at a speed lower than the normalreproduction time) or at a speed higher than another portion, or thereproduction of a certain portion is temporarily suspended, or suchforms of reproduction are appropriately combined, a random reproductionfor reproducing at a random time sequence for each of a constant set offrames shown in the frame information.

[0156] Needless to say, it is possible to appropriately combine aplurality of kinds of methods. For example, at the time of the doublespeed, the important portion is reproduced a plurality of times, andvarious variations are considered such as a method for setting areproduction speed to a normal reproduction speed.

[0157] Hereinafter, embodiments of the present invention will bespecifically explained in detail.

[0158] In the beginning, the embodiments will be explained by taking asan example a case in which a reproduction frame is determined on thebasis of the scene change quantity between adjacent frames as thecharacteristic quantity of the video data.

[0159] Here, there will be explained a case in which one frame iscorresponded to one frame information.

[0160]FIG. 8 shows one example of a data structure of the specialreproduction control information created under the target video data.

[0161] The data structure is such that the display time information 121is described which is information showing an absolute or a relativedisplay time as display time control information 102 in FIG. 1 (orinstead of the display time control information 102). A structuredescribing the importance in addition to the display time controlinformation 102 will be described later.

[0162] The video location information 101 is information which enablesthe specification of the location in the original video frame of thevideo, and any of a frame number (for example, a sequence number fromthe first frame) or a number which specifies one frame in a stream likea time stamp may be used. If the video data corresponding to the frameextracted from the original video stream is set as a separate frame, aURL or the like may be used as information for specifying the filelocation.

[0163] The display time information 121 is information which specifiesthe time for displaying the video or the number of frames. It ispossible to describe actual time or the number of frames as a unit and arelative value (for example, a normalized numeric value) which clarifiesa relationship of the relative time length with the display timeinformation described in other frame information. In the latter case,the actual reproduction time of each video is calculated from the totalreproduction time as a whole. With respect to each video, thecontinuation time of the display is not described, but such descriptionwith a combination of a start time starting from a specific timing (forexample, the start time of the first video is set to 0), and the endtime and a description with a combination of the start time and thecontinuation time may be used.

[0164] In the special reproduction, basically the reproduction of thevideo present at a location specified with the video locationinformation 101 only for the display time specified with the displaytime information 121 is consecutively conducted only for the number ofthe items of frame information “i” included in the arrangement, such asshown in FIG. 8.

[0165] If the start time and the end time or the continuation time arespecified and this designation is followed, the video present at thelocation specified with the video location information 101 isconsecutively reproduced from the start time specified with the displaytime information 121 up to the end time or during the continuation timeonly for the number of items of the frame information “i” included inthe arrangement.

[0166] The described display time can be processed and reproduced byusing parameters such as reproduction rata information and additionalinformation.

[0167] Next, a method for describing the video location information willbe explained by using FIGS. 9 through 11.

[0168]FIG. 9 explains a method for describing the video locationinformation referring to the original video frame.

[0169] In FIG. 9, a time axis 200 corresponds to the original videostream based on which the frame information for the special reproductionis created and a video 201 corresponds to one frame which becomes adescription target in the video stream. A time axis 202 corresponds toreproduction time of a video at the time of the special reproduction byusing the video 201 extracted from the original video stream. A displaytime 203 is a section corresponding to one video 201 included in thedisplay time 203. For example, the video location information 101showing the location of the video 201 and the video display time 121showing the length of the display time 203 are described as frameinformation. As described above, the description on the location of thevideo 201 may be given in any form such as a frame number, a time stampor the like as long as one frame in the original video stream can bespecified. This frame information will be described in the same mannerwith respect to the other videos 201.

[0170]FIG. 10 explains a method for describing the video locationinformation referring to the image data file.

[0171] The method for describing the video location information shown inFIG. 9 directly refers to the frame in the original data frame which isto be subjected to the special reproduction. The method for describingthe video location information shown in FIG. 10 is a method in which animage data file 300 corresponding to a single frame 302 extracted fromthe original video stream is created in a separate file, and thelocation thereof is described. A method for describing the file locationcan be handled in the same manner by using, for example, the URL or thelike both in the case where the file is present on a local storagedevice and in the case where the file is present on the network. A setof the video location information 101 showing the location of this imagedata file and the video display time 121 showing the length of thecorresponding display time 301 is described as frame information.

[0172] If a correspondence to the original video frame is required, theinformation (similar to the video location information in the case of,for example, FIG. 9) showing a single frame 302 of the original videocorresponding to the described frame information may be included in theframe information. The frame information may comprise the video locationinformation, the display time information and the original videoinformation. When the original video information is not required, it isnot required to describe the original video.

[0173] The configuration of the video data described with the method ofFIG. 10 is not particularly restricted. For example, the frame of theoriginal video may be used as it is or may be reduced. This is effectivefor conducting a reproduction processing at a high speed because it isnot required to develop the original video.

[0174] If the original video stream is compressed by means of MPEG-1 orMPEG-2 or the like, a reduced video can be created at a high speed onlyby partially decoding the streams. In this method, only the DCT (thediscrete cosine conversion) coefficients of an I picture frame encodedwithin the frame (an inner-frame encoded frame) is decoded and a reducedvideo is created by using the DCT coefficients.

[0175] In the description method of FIG. 10, the image data files arestored in separate files. However, these files may be stored in apackage in a video data group storage file having a video format (forexample, a motion JPEG) which can be accessed at random. The location ofthe video data is specified by a combination of the URL showing thelocation of the image data file, a frame number or a time stamp showingthe location in the image data file. The URL information showing thelocation of the image data file may be described in each frameinformation or may be described as additional information outside of thearrangement of the frame information.

[0176] Various methods can be taken to select the frame of the originalvideo or the like and create the video data to describe the videolocation information. For example, the video data may be extracted at anequal interval from the original video. Where the motion of the screenquite often appears, the video data is selected in a narrow interval.Where the motion of the screen quite rarely appears, the video frame isselected in a wide interval.

[0177] Here, referring to FIG. 11, there will be explained a method inwhich as one example of a method for selecting frames, the frame isselected in a narrow interval where the motion of the screen quite oftenappears while the frame is selected in a wide interval where the motionof the screen rarely appears.

[0178] In FIG. 11, a horizontal axis represents the selected framenumber, and a curve 800 represents a change in the scene change quantity(between adjacent frames). A method for calculating the scene changequantity is the same as a method at the time of calculating the displaytime described later. Here, in order to determine an extraction intervalin accordance with the motion of the scene, there is shown a method forcalculating an interval at which the scene change quantity between videoframes from which the video data is extracted becomes constant. Thetotal of the scene change quantity between video frames from which thevideo data is extracted is set to S_(i), and the total of the scenechange quantity in the whole frame is set to S (=ΣS_(i)) while thenumber of data items to be extracted is n. In order to set the videochange quantity between video frames from which video data is extractedto a constant level, S_(i)=S/n may be provided. In FIG. 11, the area Siof the scene change quantity curve 800 divided with the broken linesbecomes constant. Then, for example, the scene change quantity isaccumulated from the extracted frame, so that the video frame having thevalue exceeding the S/n is set as the frame F_(i) from which the videodata is extracted.

[0179] If the video data is created by I picture frame of MPEG, thevideo frame from which the calculated video data is created is notnecessarily the I picture, the video data is created from the I pictureframe in the vicinity thereof.

[0180] By the way, in the method explained in FIG. 11, the video framewhich belongs to the section of the scene change quantity=0 is skipped.However, if a still picture continues, the scene is important in manycases. Then, if the scene change quantity=0 continues for more than aconstant time, the frame at that time may be extracted. For example, thescene change quantity may be accumulated from the extracted frame sothat the frame having the value exceeding S/n or the frame at which thescene change quantity=0 continues for more than a constant time may beset as a frame F_(i) from which the video data is extracted. Theaccumulated value of the scene change quantity may be or may not becleared to 0. It is possible to selectively clear the accumulated valuebased on a request from the user.

[0181] In the case of an example of FIG. 11, it is assumed that thedisplay time information 121 is described so that the display timebecomes the same with respect to any of the frames. When the video isreproduced in accordance with this display time information 121, thescene change quantity becomes constant. The display time information 121may be determined and described in a separate method.

[0182] Next, there will be explained a case in which one or a pluralityof frames are allowed to correspond to one frame information.

[0183] One example of the data structure of the special reproductioninformation in this case is the same as that in FIG. 8.

[0184] Hereinafter, a method for describing the video locationinformation will be explained by using FIGS. 12 through 14.

[0185]FIG. 12 explains a method for describing the video locationinformation for referring to the continuous frames of the originalvideo.

[0186] A method for describing the video location information shown inFIG. 9 refers to one frame 201 in one original video for conducting thespecial reproduction. However, the method for describing the videolocation information shown in FIG. 12 describes a set 500 of a pluralityof continuous frames in the original video. The set 500 of frames mayinclude some frames extracted from the plural continuous frames withinthe original video. The set 500 of frames may include only one frame.

[0187] If the set 500 of frames includes a plurality of continuousframes or one frame in the original video, the location of the startframe and the location of the end frame are described, or the locationof the start frame and the continuation time of the set 500 aredescribed in the description of the frame location (if one frame isincluded, for example, the start frame is set equal to the end frame).In the description of the location and the time, the frame number andthe time stamp and the like are used which can specify frames in thestreams.

[0188] If the set 500 of frames is a part out of a plurality ofcontinuous frames in the original video, information is described whichenables the specification of the frames. If the method for extractingthe frames is determined, and the specification of the frames can bespecified with the description of the locations of the start frame andthe end frame, the start frame or the end frame may be described.

[0189] The display time information 501 shows the total display timecorresponding to the whole frame group included in the correspondingframe set 500. The display time of each frame included in the set 500 offrames can be appropriately determined on the side of device for thespecial reproduction. As a simple method, there is available a method inwhich the above total display time is equally divided with the totalnumber of frames in the set 500 to provide one frame display time.Various other methods are available.

[0190]FIG. 13 explains a method for describing video locationinformation for referring to a set of the image data files.

[0191] The method for describing the video location information shown inFIG. 12 directly refers to continuous frames in the original video to bereproduced. A method for describing the video location information shownin FIG. 13 creates a set 600 of the image data files corresponding tothe original video frame set 602 extracted from the original videostream in a separate file and describes the location thereof. In themethod for describing the file location, the file can be handled in thesame manner by using, for example, URL or the like, even if the file ispresent on a local storage device or if the file is present on anetwork. A set of the video location information 101 showing thelocation of this image data file and the video display time 121 showinga length of the corresponding display time 601 can be described as theframe information.

[0192] If a correspondence with the original frame is required,information showing the frame set 602 of the original videocorresponding to the described frame information (for example,information similar to the video location information in the case ofFIG. 12) may be included in the frame information. The frame informationmay comprise the video location information, the display timeinformation and the original video information. The original videoinformation is not required to be described when the information is notrequired.

[0193] The configuration of the video data, the preparation of the videodata, the preparation of the reduced video, the method for storing thevideo data and the method for describing the location information suchas the URL or the like are the same as what has been described above.

[0194] Various methods can be adopted in the same manner as describedabove as to which frame of the original video is selected to create thevideo data to be described in the video location information. Forexample, the video data may be extracted at an equal interval from theoriginal video. Where a motion of the screen quite often appears, aframe is extracted in a narrow interval. Where the motion of the screenrarely appears, a frame is extracted in a wide interval.

[0195] In the above embodiments, the image data file 300 is correspondedto the original video 302 in a frame to frame manner. It is possible tomake the location information of the frame described as the originalvideo information have a time width.

[0196]FIG. 14 shows an example in which the original video informationis allowed to have a time width with respect to the FIG. 8. An originalvideo information 3701 is added to the frame information structure shownin FIG. 8. The original video information 3701 comprises a start pointinformation 3702 and a section length information 3703 which are thestart point and the section length of the original video which is atarget of the special reproduction. The original video information 3701comprises any information which can specify the section of the originalvideo having the time width. It may comprise the start point informationand an end point information in stead of the start point information andthe length information.

[0197]FIG. 15 shows an example in which the original video informationis allowed to have a time width with respect to the FIG. 9. In thiscase, for example, as video location information, display timeinformation and original video information included in the same frameinformation, the location of the original video frame 3801, the displaytime 3802, and the original video frame section 3803 which comprises thestart point (frame location) and the section length are described toshow that these correspond to each other. That is, as a videorepresentative of the original video frame section 3803, the originalvideo frame location 3801 described in the video location information isdisplayed.

[0198]FIG. 16 shows an example in which the original information isallowed to have a time width with respect to the FIG. 10. In this case,for example, as video location information, display time information andoriginal video information included in the same frame information, thelocation of the image data file 3901 for the display, the display time3902, and the original video frame section 3903 which comprises thestart point (frame location) and the section length are described toshow that these correspond to each other.

[0199] That is, as a video representative of the original video framesection 3903, the image 3901 in the image data file described in thevideo location information is displayed.

[0200] Furthermore, as shown in FIGS. 12 and 13, if a set of frames isused as a video for the display, a section different from the originalvideo frame section for displaying the video may be allowed tocorrespond to the original video information.

[0201]FIG. 17 shows an example in which the original video informationis allowed to have a time width with respect to the FIG. 12. In thiscase, for example, as video location information, display timeinformation and original video information included in the same frameinformation, a set 4001 of frames in the original video, the displaytime 4002, and the original video frame section 4003 which comprises thestart point (frame location) and the section length are described toshow that these correspond to each other.

[0202] At this time, the section 4001 of a set of frames which aredescribed as video location information, and the original video framesection 4003 which is described as the original video information arenot necessarily required to coincide with each other and a differentsection may be used for display.

[0203]FIG. 18 shows an example in which the original video informationis allowed to have a time width with respect to the FIG. 13. In thiscase, for example, as video location information, display timeinformation and original video information included in the same frameinformation, a set 4101 of frames in the video file, the display time4102, and the original video frame section 4103 which comprises thestart point (frame location) and the section length are described toshow that these correspond to each other.

[0204] At this time, the section of a set 4101 of frames described asvideo location information, and the original video frame section 4103described as the original video are not necessarily required to coincidewith each other. That is, the section of the set 4101 of the frames forthe display may be shorter or longer than the original video framesection 4103. Furthermore, a video having completely different contentsmay be included therein. In addition, only particularly importantsection may be extracted from the section described in the originalvideo location as the image data file so that collected video data isused.

[0205] At the time of displaying the videos based on, for example, thesummarized reproduction (special reproduction) using these items of theframe information, it may be desired that the corresponding frame in theoriginal video is referred to.

[0206]FIG. 19 shows a flow for starting the reproduction from the frameof the original video corresponding to the video frame displayed inspecial reproduction. At step S3601, the reproduction start frame isspecified in the special reproduction. At step S3602, the original videoframe corresponding to the specified frame is calculated with a methoddescribed later. At step S3603, the original video is reproduced fromthe calculated frames.

[0207] This flow can be used for referring to the corresponding locationof the original video in addition to special reproduction.

[0208] At step S3602, as one example of a method for calculating thecorresponding original video frame, there is shown a method for usingthe proportional distribution with respect to display time of thespecified frame. The display time information included in the i-th frameinformation is set to D_(i) sec, the section start location of theoriginal video information is set to t_(i) sec, and the section lengthis set to d_(i) sec. If the location is specified at which t sec haspassed from the start of the reproduction using the i-th frameinformation, the frame location of the corresponding original video isT=t_(i)+d_(i)×t/D_(i).

[0209] Referring to FIGS. 20 and 21, as examples of a method forselecting a frame, there will be explained a method for extracting theframe in a narrow interval where the motion of the screen quite oftenappears while extracting the frame in a wide interval where the motionof the screen rarely appears in accordance with the motion of thescreen. The horizontal axis, the curve 800, and S_(i) and F_(i) are thesame as those in FIG. 11.

[0210] In the example of FIG. 11, the video data is extracted one frameafter another at an interval at which the scene change quantity betweenthe frames from which the video data is extracted is made constant.FIGS. 20 and 21 show examples in which a set of a plurality of framesare extracted based on the frame F_(i) as reference. For example, asshown in FIG. 20, the same number of continuous frames may be extractedfrom F_(i). The frame length 811 and the frame length 812 equal to eachother. As shown in FIG. 21, the corresponding number of continuousframes may be extracted so that the total of the scene change quantityfrom F_(i) becomes constant. The area 813 and the area 814 equal to eachother. Various other methods can be considered.

[0211] It is possible to use the frame selection method in which theframe is extracted when the scene change quantity=0 continues for morethan a constant time.

[0212] As in the case of FIG. 11, the display time information 121 maybe described so that the same display time may be provided with respectto any of frame sets in the cases of FIGS. 20 and 21. Alternatively, thedisplay time information may be determined and described in a differentmethod.

[0213] Next, one example of a processing for calculating the displaytime will be explained.

[0214]FIG. 22 shows one example of a procedure of the basic processingfor calculating the display time so that the scene change quantitybecomes constant as much as possible when the video described in thevideo location information is continuously reproduced in accordance withtime described in the display time information.

[0215] This processing can be applied to a case in which the frames areextracted in any method. For example, if the frames are extracted in amethod shown in FIG. 11, the processing can be omitted. Since theprocessing shown in FIG. 11 selects the frames such that the scenechange quantity becomes constant when the frames are displayed for afixed time period.

[0216] At step S71, the scene change quantity between adjacent frames iscalculated with respect to all frames of the original video. If eachframe of the video is represented in bit map, the differential value ofthe pixel between adjacent frames can be set to the scene changequantity. If the video is compressed with MPEG, the scene changequantity can be calculated by using a motion vector.

[0217] One example of a method for calculating the scene change quantitywill be explained.

[0218]FIG. 23 shows one example of a basic processing procedure forcalculating a scene change quantity of all frames from the video streamscompressed with MPEG.

[0219] At step S81, a motion vector is extracted from the P pictureframe. The video frame compressed with the MPEG is described with anarrangement of I picture (an inner-frame encoded frame), P picture (aninter frame encoded frame in a forward prediction), and B picture (aninter-frame encoded frame in a backward prediction), as shown in FIG.24. The P picture includes a motion vector corresponding to a motionfrom the preceding I picture or P picture.

[0220] At step S82, the magnitude (intensity) of the each motion vectorincluded in the frame of one P picture is calculated, and an averagethereof is set as a scene change quantity from the preceding I pictureor P picture.

[0221] At step S83, on the basis of the scene change quantity calculatedwith respect to the P picture, the scene change quantity is calculatedfor each one frame corresponding to the frame other than the P picture.For example, if the average value of the motion vector of the P pictureframe is p, and the interval from the preceding I picture or P picturefrom which the video is referred to is d, the scene change quantity perone frame of each frame is set to p/d.

[0222] Subsequently, at step S72 in the procedure of FIG. 22, the totalof the scene change quantity of frames between the following descriptiontarget frames is calculated from the description target frame describedin the video location information.

[0223]FIG. 25 describes a change in the scene change quantity for eachone frame. The horizontal axis corresponds to the frame number while acurve 1000 denotes a change in the scene change quantity. If the displaytime of the video having the location information of the frameinformation F_(i) is calculated, the scene change quantity in thesection 1001 up to F_(i+1) is added which corresponds to the framelocation of the next description target frame. It is considered thatthis becomes an area S_(i) of the hatching portion 1002, which is amagnitude of a motion of the frame location F_(i).

[0224] Subsequently, at step S73 in the procedure of FIG. 22, thedisplay time of each frame is calculated. In order to set the scenechange quantity to a constant level as much as possible, a largerquantity of the display time may only be allocated to the frame wherethe motion of the screen is large, so that the ratio of the display timeallocated to the video of each frame location F_(i) to the reproductiontime may be set to S_(i)/ΣS_(i). When the total of the reproduction timeis set to T, the display time of each video will be set toD_(i)=T×S_(i)/ΣS_(i). The value of the total T of the reproduction timeis defined as the total reproduction time of the original video.

[0225] If no scene change appears and S_(i)0, the lower limit value (forexample, 1) which is calculated in advance may be entered, or the frameinformation thereof may not be described. Even with respect to the framewhere the screen change is very small even if S_(i)=0 is not providedand virtually no change is displayed on the actual reproduction, thelower limit value may be substituted and no frame information may bedescribed. If no frame information is described, the value of S_(i) maybe added to S_(i+1) or may not be added thereto.

[0226] The processing for calculating this display time can be conductedfor the preparation of the frame information with the specialreproduction control information creating apparatus, but the processingcan be conducted at the time of the special reproduction on the side ofthe video reproduction apparatus.

[0227] Next, there will be explained a case in which the specialreproduction is conducted.

[0228]FIG. 26 shows one example for the N times high-speed reproductionon the basis of the special reproduction control information that hasbeen described.

[0229] At step S111, the display time D′_(i) at the time of reproductionis calculated on the basis of the reproduction rate information. Thedisplay time information described in the frame information is standarddisplay time, the display time D′_(i)=D_(i)/N of each frame iscalculated when reproduction at N times high-speed is conducted.

[0230] At step S112, initialization for the display is conducted, andi=0 is set so that the first frame information is displayed.

[0231] At step S113, it is determined whether the display time D′_(i) ofthe i-th frame information is larger than the threshold value of thepreset display time.

[0232] If the display time is larger, the video location informationincluded in the i-th frame information F_(i) is displayed for D′_(i)seconds at step S114.

[0233] If the display time is not larger, the process proceeds to stepS115 to search the i-th frame information which is not smaller than thethreshold value in a forward direction. During search, the display timeof the frame information which is smaller than the threshold value ofthe display time is all added to the display time of the i-th frameinformation. The display time of the frame information which is smallerthan the threshold value of the display time is set to 0. The reason whysuch processing is conducted is that the time for preparing the video tobe displayed becomes longer than the display time when the display timeat the time of reproduction becomes very short with the result that thedisplay cannot be conducted in time. Then, if the display time becomesvery short, the process proceeds to the next step without displaying thevideo. At that time, this display time of the video which is notdisplayed is added to the display time of the video to be displayed sothat the total display time becomes unchanged.

[0234] At step S116, it is determined whether “i” is smaller than thetotal number of the frame information items in order to determinewhether or not the frame information which is not displayed remains. If“i” is lower than the total number of the frame information items, theprocess proceeds to step S117 to increment “i” by one to create for thedisplay of the next frame information. When “i” reaches the total numberof the frame information items, the reproduction processing iscompleted.

[0235]FIG. 27 shows one example for conducting the N times high-speedreproduction on the basis of the described special reproduction controlinformation by taking the display cycle as a reference.

[0236] At step S121, the display time D′_(i) of each frame is calculatedas D′_(i)=D_(i)/ N at the N times high-speed reproduction. Here, thecalculated display time is actually associated with the display cycle sothat the video cannot be always displayed in a calculated time.

[0237]FIG. 28 shows a relationship between the calculated display timeand the display cycle. The time axis 1300 shows the calculated displaytime while the time axis 1301 shows the display cycle based on thedisplay rate. If the display rate is f frame/sec, an interval of thedisplay cycle becomes 1/f sec.

[0238] Consequently, at step S122, the frame information F_(i) includingthe start point of the display cycle is searched while the videoincluded in the frame information F_(i) is displayed for one displaycycle (1/f sec) at step S123.

[0239] For example, the display cycle 1302 (FIG. 28) displays the videoof the frame information corresponding to this display time because thedisplay start point 1303 is included in the calculated display time1304.

[0240] A method for allowing the display cycle correspond to the frameinformation may display the video at the nearest location of the startpoint of the display cycle, as shown in FIG. 29. If the display timebecomes smaller than the display cycle like the display time 1305 ofFIG. 28, the display of the video may be omitted. If the video isforcibly displayed, the display time before and after the video isshortened to adjust so that the total display time becomes unchanged.

[0241] At step S124, it is determined whether the current display is thefinal display or not. If the current display is the final display, theprocessing is completed. If the display is not the final display, theprocess proceeds to step S125 to conduct the processing of the nextdisplay cycle.

[0242]FIG. 30 shows another example of a data structure for describingthe frame information. The frame information included in the datastructure of FIG. 8 or FIG. 14 summarizes a single original video. Aplurality of original videos can be summarized by expanding the frameinformation. FIG. 30 shows such an example. An original video locationinformation 4202 for indicating the original video file location isadded to the original video information 4201 included in the individualframe information. The file described in the original video locationinformation 4202 is not necessarily required to handle the entire file.The file can be used in the form in which only a portion of the sectionis extracted. In this case, not only file information such as a filename or the like but also the section information showing which sectionof the file becomes an object are additionally described. Pluralsections may be selected from the original video.

[0243] Furthermore, if several kinds of the original videos are presentand identification information is individually added to the videos, theoriginal video identification information may be described in place ofthe original video location information.

[0244]FIG. 31 explains an example in which a plurality of originalvideos are summarized and displayed by using the frame information addedwith the original video location information. In this example, threevideos are summarized to display one summarized video. With respect tothe video 2, in place of the whole section, two sections 4301 and 4302are taken out to handle the respective videos. As the frame information,together with these original video information, the frame location (4303with respect to 4301) of respective representative video is described asthe video location information while the display time (4304 with respectto 4301) is described as the display time information.

[0245]FIG. 32 explains another example in which a plurality of originalvideos are summarized and displayed by using the frame information addedwith the original video location information. In this example, threevideos are summarized to display one summarized video. With respect tothe video 2, in place of the whole section, a portion of the section istaken out. A plurality of sections may be taken out as described in FIG.31. As the frame information, together with these items of the originalvideo information (for example, the section information 4401 in additionto the video 2), the storage location of respective representative videofiles 4402 is described as the video location information and thedisplay time 4403 is described as display time information.

[0246] Addition of the original video location information to the frameinformation which has been explained in these examples can be appliedcompletely in the same way to the case in which a set of frames is usedas video location information with the result that a plurality oforiginal videos are summarized and displayed.

[0247]FIG. 33 shows another data structure for describing the frameinformation. In this data structure, in addition to the video locationinformation 101, the display time information 121 and the original videoinformation 3701 which has been already explained, a motion information4501 and interest region information 4502 are added. The motioninformation 4501 describes a magnitude of a motion (a scene changequantity) in a section (the section described in the original videoinformation) of the original video corresponding to the frameinformation. The interest region information 4502 refers to adescription of the information which should be particularly interestedin the video which is described in the video location information.

[0248] The motion information can be used for calculating the displaytime of the video described in the video location information as used atthe time of calculating the display time from the motion of the video,as shown in FIG. 22. In this case, even when the display timeinformation is omitted and only the motion information is described,special reproduction such as high-speed reproduction can be conducted inthe same manner as in the case in which the display time is described.In this case, the display time is calculated at the time ofreproduction.

[0249] Both the display time information and the motion information canbe described at the same time. In that case, an application fordisplaying uses the required one of the two, or uses both in combinationin accordance with the processing.

[0250] For example, the display time calculated irrespective of themotion is described in the display time information. A method forcalculating the display time for cutting out important scenes from theoriginal video corresponds to this. At the time of the high-speedreproduction of the summarized contents calculated in this manner, themotion information is used so that a portion with a large motion isreproduced slowly while a portion with a small motion is reproducedquickly with the result that a high-speed reproduction free from a largeoverlook is enabled.

[0251] The interest region information is used when the particularlyinterest region is present in the video described in the video locationinformation of the frame information. For example, faces of persons whoseem to be important correspond to this. At the time of displaying thevideo including such interest region information, the display may beconducted by overlapping a square frame so that the interest region canbe easily detected. The frame display is not indispensable, and thevideo may only be displayed as it is.

[0252] The interest region information can be used for processing anddisplaying the special reproduction control information such as frameinformation or the like. For example, if a part of the frame informationis reproduced and displayed, the frame information including theinterest region information is displayed with priority. Further, it isassumed that the frame information including square area with large areahas higher importance, thereby making it possible to selectivelydisplaying he video.

[0253] As shown above, there has been explained an example in which theprocessing is conducted on the basis of the scene change quantity.Hereinafter, there will be explained a case in which the importanceinformation is used.

[0254]FIG. 34 is a view showing examples of a data structure of theframe information attached to the video.

[0255] An importance information 122 is described in addition to or inplace of the display time control information 102 in the data structureof the frame information of FIG. 1. The display time is calculated basedon the importance information 122.

[0256] The importance information 122 represents the importance of thecorresponding frame (or a set of frames). The importance is represented,for example, as an integer in a constant range (for example, 0 to 100),or is represented as an actual number in a constant range (for example,0 to 1). Otherwise, the importance information 122 may be represented asan integer or an actual number value without setting the upper limit.The importance information 122 may be attached to all the frames of thevideo, or only the frame in which the importance is changed.

[0257] In this case as well, it is possible to take any form of FIGS. 9,10, 12, and 13. The frame extraction method of FIGS. 11, 20, and 21 canbe used. In this case, the scene change quantity of FIGS. 11, 20, and 21may be replaced by the importance.

[0258] Next, in the example which has been explained above, the displaytime is set with the scene change quantity. However, the display timemay be set by the importance information. Hereinafter, the method forsetting the display time will be explained.

[0259] In the setting the display time on the basis of the scene changequantity exemplified above in order to understand the video contentswell, the display time is set long where the change quantity is largeand the display time is set short where the change quantity is small. Inthe setting of the display time on the basis of this importance, thedisplay time is set long where the importance is high and the displaytime is set short where the importance is low. That is, since the methodfor setting the display time according to the importance is basicallysimilar to the method for setting the display time based on the scenechange quantity, the method will be briefly explained.

[0260]FIG. 36 shows one example of the basic processing procedure inthis case.

[0261] At step S191, the importance of all frames of the original videowill be calculated. A concrete method thereof will be exemplified later.

[0262] At step S192, the total of the importance from the descriptionobject frame described in the video location information to the nextdescription object frame will be calculated.

[0263]FIG. 37 describes the change in the importance for each one frame.Reference numeral 2200 denotes the importance. If the display time ofthe video having the location information of the frame information F_(i)is calculated, the importance in the section up to F_(i+1) which is thenext description object frame location is accumulated. The accumulationresult is an area S′_(i) of the hatching portion 2202.

[0264] At step S193, the display time of each frame is calculated.Suppose that the ratio of the display time allocated to the video ateach frame location F_(i) the reproduction time is set to S′_(i)/ΣS′j.When the total of the reproduction time is set to T, the display time ofeach video becomes D_(i)=T×S′_(i)/S′_(j). The value of the total T ofthe reproduction time is a standard reproduction time to be regulated asthe total reproduction time of the original video.

[0265] When the total of the importance becomes S′_(i)=0, the presetlower limit value (for example, 1) may be described, or the frameinformation may not be described. Even if S′_(i)=0 is not establishedbut the importance is very small, and it is assumed that such a frame isvirtually not displayed, the lower limit value may be described or theframe information may not be described. If the frame information is notdescribed, the S′_(i) value may be added and may not be added toS′_(i+1).

[0266] As shown in FIG. 34, in the data structure of the frameinformation of FIG. 1, the video location information 101, the displaytime information 121 and the importance information 112 may be describedin each frame information “i”. At the time of the special reproduction,the display time information 121 is used but the importance information122 is not used; the importance information 122 is used but the displaytime information 121 is not used; both the importance information 122and the display time information 121 are used; and neither theimportance information 122 nor the display time information 121 is used.

[0267] The processing of calculating the display time can be conductedfor preparing the frame information with the special reproductioncontrol information creating apparatus. However, the processing may beconducted on the side of the video reproduction apparatus at the time ofthe special reproduction.

[0268] Next, a method (for example, step S191 of FIG. 36) forcalculating the importance of each frame or the scene (video framesection) will be explained.

[0269] Since various factors are normally intertwined in the judgment asto a certain scene having a video is important, the most appropriatemethod for calculating the importance is a method in which mandetermines the importance. In this method, importance evaluatorevaluates the importance for each scene of the video, or for each of theconstant interval, so that the importance is input as the importancedata. The importance data referred to here refer to a frame number ortime and a correspondence table with the importance value. In order toavoid subjective evaluation of importance, a plurality of importanceevaluators are allowed to evaluate the same video to calculate theaverage value (or a median or the like will do) for each scene or eachvideo frame section so that the importance is finally determined. Insuch manual input of the importance data, it is possible to add vagueexpressions and a plurality of elements which cannot be expressed inwords to the importance.

[0270] In order to omit the trouble of determination by man, it ispreferable that a phenomenon is expected in which a video scene whichseems to be important is likely to appear, and the processing is usedwhich automatically evaluates such phenomenon to convert the phenomenoninto importance. Here, some examples are shown in which importance isautomatically created.

[0271]FIG. 38 shows an example of a processing procedure at the time ofautomatically calculating important data on the basis of the idea that ascene having a large sound level is important. FIG. 38 is established asa function block diagram.

[0272] In the sound level calculation processing at step S210, the soundlevel at each time is calculated out when the sound level attached tothe video is calculated. Since the sound level largely changes in aninstant, the smoothing processing or the like may be conducted in thesound level calculation processing at step S210.

[0273] In the importance calculation processing at step S211, aprocessing is conducted for converting into the importance the soundlevel output as a result of the sound level calculation processing. Forexample, the sound level input is linearly converted into a value of 0to 100, the sound level having the lowest sound level set in advancebeing set to 0, and having the highest sound level being set to 100. Thesound level not more than the lowest sound level is set to 0 while thesound level not less than the highest sound level is set to 100. As aresult of the importance calculation processing, the importance at eachtime is calculated to be output as importance data.

[0274]FIG. 39 shows an example of a processing procedure of a method forautomatically calculating another importance level. FIG. 39 isestablished as a function block diagram.

[0275] In processing of FIG. 39, it is determined that the scene inwhich important words registered in advance in the sound attached to thevideo quite often appear is important.

[0276] In the sound recognition processing at step S220, when the sounddata attached to the video is input, the language (words) man talks isconverted into text data in the sound recognition processing.

[0277] In the important word dictionary 221, words which are likely toappear in important scenes are registered. If the degree of importanceof registered words differs, the weight is added to each of theregistered words.

[0278] In the word collation processing at step S222, the text datawhich is an output of the sound recognition processing is collated withthe words registered in the important word dictionary 221 to determinewhether or not important words are talked.

[0279] In the importance calculation processing at step S223, theimportance in each scene of the video or at each time is calculated fromthe result of the word collation processing. In this calculation, thenumber of the appearances of important words and the weight of theimportant words are used so that the processing is conducted to increasethe importance around the time at which, for example, important wordshave appeared (or of the scene in which the important words haveappeared) by a constant value, or a value proportional to the weight ofthe important words. As a result of the important calculationprocessing, the importance at each time is calculated to be output asimportance data.

[0280] If the weight of all the words is set to the same, the importantword dictionary 221 becomes unnecessary. This is because that it isassumed that the scene in which many words are spoken is important. Atthis time, in the word collation processing at step S222, the processingof counting the number of words output from the sound recognitionprocessing is conducted. Not only the number of words but also thenumber of characters may be counted.

[0281]FIG. 40 shows an example of a processing procedure of the methodfor automatically calculating the other importance level. FIG. 40 isalso established as a function block diagram.

[0282] The processing of FIG. 40 determines that the scene in which manyimportant words appear which are registered in advance in the telopappearing in the video is important.

[0283] In the telop recognition processing at step S230, the characterlocation in the video is specified to recognize characters by convertingthe video region at the character location into a binary value. Therecognized result is output as text data.

[0284] The important word dictionary 231 is the same as the importantword dictionary 221 of FIG. 39.

[0285] In the word collation processing at step S232, in the same manneras at step S222 in the procedure of FIG. 39, the text data which is anoutput of the telop recognition processing is collated with the wordsregistered in the important word dictionary 231 to determine whether ornot important words have appeared.

[0286] In the importance calculation processing at step S232, theimportance at each scene or at each time is calculated from the numberof appearances of important words, and weight of the important words inthe same manner as at step S223 in the procedure of FIG. 39. As a resultof the importance calculation processing, the importance at each time isdetermined to be output as importance data.

[0287] If the weight of all the words is set to the same, the importantword dictionary 231 becomes unnecessary. This is because that it isassumed that the scene in which many important words appear is animportant scene. At this time, in the word collation processing at stepS232, processing is conducted for counting the number of words simplyoutput from the telop recognition processing. Not only the number ofwords but also the number of characters may be counted.

[0288]FIG. 41 shows an example of a processing procedure of a method forautomatically calculating still another importance level. FIG. 41 isestablished as a function block diagram.

[0289] The processing of FIG. 41 determines that when the telopappearing in the video is in larger character size, the scene is moreimportant.

[0290] In the telop detection processing at step S240, the processing isconducted for specifying the location of character string in the video.

[0291] In the character size calculation processing at step S241,individual characters are extracted to calculate the average value orthe maximum value of the size (area) of the character.

[0292] In the importance calculation processing at step S242, theimportance is calculated which is proportional to the size of thecharacter which is an output of the character size calculationprocessing. If the calculated importance is too large or too small, theprocessing is conducted for restricting the importance to a preset rangewith the threshold value processing. As a result of the importancecalculation processing, the importance at each time is calculated to beoutput as importance data.

[0293]FIG. 42 shows an example of the processing procedure of a methodfor automatically calculating still another importance level. FIG. 42 isestablished as a function block diagram.

[0294] The processing of FIG. 42 determines that the scene in whichhuman faces appear in the video is important.

[0295] In the face detection processing at step S250, the processing isconducted for detecting an area which looks like a human face in thevideo. As a result of the processing, the number of areas (number offaces) which are determined to be a human face is output. Theinformation on the size (area) of the face may be output at the sametime.

[0296] In the importance calculation processing at step S251, the numberof faces which is an output of the processing of detecting the faces ismultiplied by several times to calculate the importance. If the outputof the face detection processing includes face size information,calculation is conducted so that the importance increases with anincrease in the size of faces. For example, the area of the face ismultiplied by several times to calculate the importance. As a result ofthe importance calculation processing, the importance at each time iscalculated to be output as importance data.

[0297]FIG. 43 shows an example of the processing procedure of a methodfor automatically calculating still other importance level. FIG. 43 isalso established as a function block diagram.

[0298] In the processing of FIG. 43, it is determined that the scene inwhich a video similar to the video which is registered in advanceappears is important.

[0299] The video which should be determined to be important isregistered in the important scene dictionary 260. The video is recordedas raw data or is recorded in a data compressed form. Instead of thevideo itself, the characteristic quantity (a color histogram, afrequency or the like) of the video may be recorded.

[0300] In the similarity/non-similarity calculation processing at stepS261, similarity/non-similarity between the video registered in theimportant scene dictionary 260 and the input video data is calculated.As the non-similarity, the total of the square error or the total of thedifference in the absolute value is used. If the video data is recordedin the important scene dictionary 260, the total of the square error foreach of the corresponding pixels and the total of the differential ofthe absolute valued are calculated as non-similarity. If the colorhistogram of the video is recorded in the important scene dictionary260, the same color histogram is calculated with respect to the inputvideo data to calculate the total of the square error between histogramsand the total of the difference in the absolute values to set thesetotals as non-similarity.

[0301] In the importance calculation processing at a step S262, theimportance is calculated from the similarity/non-similarity which is anoutput of the similarity and non-similarity calculation processing. Theimportance is calculated in such a manner that larger similarityprovides greater importance if the similarity is input while largernon-similarity provides smaller importance if the non-similarity isinput. As a result of the importance calculation processing, theimportance at each time is calculated to be output as the importancedata.

[0302] Furthermore, as another method for automatically calculating theimportance, the scene having a high instant viewing rate is set as animportant scene. The data on the instant viewing rate is obtained as aresult of the summing of the viewing rate investigation, so thatimportance is calculated by multiplying the instant viewing rate byconstant times. Needless to say, there are various other methods.

[0303] The importance calculation processing may be solely conducted, ora plurality of data items may be used at the same time to calculate theimportance. In the latter case, for example, the importance of one videois calculated with several different methods to calculate the finalimportance as an average value or a maximum value.

[0304] In the above embodiment, the explanation has been given by citingthe scene change quantity and the importance. However, it is possible touse one item of information or a plurality of items of information(described in the frame information) together with the scene changequantity or the importance or instead of the scene change quantity orimportance.

[0305] Next, there will be explained a case in which information for thecontrol of reproduction/non-reproduction is added to the frameinformation (see FIG. 1).

[0306] It is desired that either only a specific scene or a part thereof(for example, a high-light scene) or only a scene or a part thereof inwhich a specific person appears is reproduced. Thus, there is a demandof watching only a portion of the video.

[0307] In order to satisfy this desire, thereproduction/non-reproduction information may be added to the frameinformation for controlling the reproduction or the non-reproduction. Asa consequence, only a part of the video is reproduced or only a part ofthe video is not reproduced on the basis of thereproduction/non-reproduction information.

[0308]FIGS. 44, 45, and 46 show examples of a data structure in whichthe reproduction/non-reproduction information is added.

[0309]FIG. 44 shows a data structure in which thereproduction/non-reproduction information 123 is added to the datastructure of FIG. 8. FIG. 45 shows a data structure in which thereproduction/non-reproduction information 123 is added to the datastructure of FIG. 34. FIG. 46 shows a data structure in which thereproduction/non-reproduction information 123 is added to the datastructure of FIG. 35. Though not shown, it is possible to add thereproduction/non-reproduction information 123 to the data structure ofFIG. 1.

[0310] The reproduction/non-reproduction information 123 may be binaryinformation specifying whether the video is reproduced or not or acontinuous value such as reproduction level or the like.

[0311] For example, in the latter case, when the reproduction levelexceeds a certain threshold value at the time of reproduction, the videois reproduced. When the reproduction level is less than the thresholdvalue, the video is not reproduced. The user can directly or indirectlyspecify the threshold value.

[0312] The reproduction/non-reproduction information 123 may be set asindependent information to be stored. If the reproduction ornon-reproduction is selectively specified, the non-reproduction can bespecified when the display time shown in the display time information121 is set to a specific value (for example, 0 or −1). Alternatively,the non-reproduction can be specified when the importance indicated bythe importance information 122 is set to a specific value (for example,0 or −1). The reproduction/non-reproduction information 123 may not beadded.

[0313] If the reproduction or non-reproduction is specified with a levelvalue, the display time information 121 and/or the importanceinformation 122 (represented by the level value) can be used as asubstitute.

[0314] If the reproduction/non-reproduction information 123 ismaintained as independent information, the quantity of data increases bythat quantity. It is possible to see a digest of the video by allowingthe non-reproduction specification portion not to be reproduced on thereproduction side. It is also possible to see the whole video byreproducing the non-reproduction specified portion. If thereproduction/non-reproduction information 123 is not maintained asindependent information, it is necessary to appropriately change thedisplay time specified, for example, as 0 in order to see the wholevideo by reproducing the non-reproduction specified portion.

[0315] The reproduction/non-reproduction information 123 may be input byman or may be determined with some conditions. For example, when themotion information of the video is set to a constant value or more, thevideo is reproduced. When the motion information of the video is not setto a constant value or more, the video is not reproduced so that onlybrisk motion portion can be reproduced. When it is determined that theskin color is larger or smaller than the constant value from colorinformation, only the scene where man appears can be reproduced. Amethod for calculating the information with the magnitude of sound, anda method for calculating the information from the reproduction programinformation which is input in advance can be considered. The importancemay be calculated with some technique to create thereproduction/non-reproduction information 123 from the importanceinformation. When the reproduction/non-reproduction information is setto a continuous value, the importance may be calculated by convertingthe information into the reproduction/non-reproduction information.

[0316]FIG. 47 shows an example in which reproduction/non-reproductioncontrol is carried out so that video is reproduced on the basis of thereproduction/non-reproduction information 123.

[0317] In FIG. 47, it is supposed that the original video 2151 isreproduced on the basis of the video frame location informationrepresented with F₁ through F₆ or the video frame group locationinformation 2153 and the display time information represented with D₁through D₆. At this time, it is supposed that thereproduction/non-reproduction information is added to the display timeinformation 2154. In this example, the sections of D₁, D₂, D₄ and D₆ canbe reproduced, and other sections cannot be reproduced, the sections ofD₁, D₂, D₄ and D₆ are continuously reproduced as the reproduction video2152 (while other sections cannot be reproduced).

[0318] For example, in the frame F_(i) of the reproduction video, if thedisplay time is set to D⁺ _(i) when the reproduction/non-reproductioninformation 123 shows reproduction, and the display time is set to D⁻_(i) when the reproduction/non-reproduction information 123 shows thenon-reproduction, Σ_(i)D⁺ _(i)=T′ when the total time of thereproduction portion of the original video is set to T′. Normally, thedisplay time of D⁺ _(i) is set to a time which is required to reproducethe original video at a normal speed. The reproduction speed may be setto a predetermined high-speed. Information may be described as to whichtimes the speed is to be set. When it is desired that the video isreproduced at N times high-speed, the display time D⁺ _(i) of thereproduction portion is multiplied by 1/N times. For example, in orderto perform reproduction at the predetermined time D′, the display timeD⁺ _(i) of each reproduction portion may be processed and displayed atD′/Σ_(i)D⁺ _(i) times.

[0319] If the display time of each frame (or a frame group) isdetermined on the basis of the frame information, the determined displaytime may be adjusted.

[0320] In a method in which the calculated display time is not adjusted,the display time which is calculated without taking into considerationthe generation of the non-reproduction section is used as it is, so thatwhen the display time exceeding 0 is originally allocated to thenon-reproduction section the whole display time is shortened for thatallocation portion.

[0321] In a method in which the calculated display time is adjusted, forexample, if the display time exceeding 0 is originally allocated to thenon-reproduction section, the adjustment is made by multiplying by aconstant number the display time of each of the frames (or the framegroup) to be reproduced so that the whole display time becomes equal tothe time at the time of the reproduction of the non-reproductionsection.

[0322] The user may make a selection as to whether the adjustment is tobe made.

[0323] If the user specifies the N times reproduction, the N timeshigh-speed reproduction processing may be conducted without theadjustment of the calculated display time. The N times high-speedreproduction processing may be conducted on the basis of the displaytime after the adjustment of the calculated display time in the abovemanner (the display time of the former becomes shorter).

[0324] The user may specify the whole display time. In this case aswell, for example, the display time of each frame (or a frame group) tobe reproduced is multiplied by a constant number to make an adjustmentso that the display time becomes equal to the specified whole displaytime.

[0325]FIG. 48 shows one example of the processing procedure forreproducing only a portion of the video on the basis of thereproduction/non-reproduction information 123.

[0326] At step S162, the frame information (video location informationand display time information) is read to determine whether the frame isto be reproduced from the reproduction/non-reproduction information inthe display time information at step S163.

[0327] When it is determined that the reproduction is to be conducted,the frame is displayed for the portion of the display time at step S164.When it is determined that the reproduction is not to be conducted, theframe is not displayed and the processing is moved to the next frameprocessing.

[0328] It is determined at step S161 whether or not the whole video tobe reproduced is processed. When the whole video is processed, thereproduction processing is also ended.

[0329] When it is determined that the frame is to be reproduced or notat step S163, it is desired in some cases that the determination isdepending on the taste of the user. At this time, it is determined fromthe user profile whether or not the non-reproduction portion isreproduced in advance before the reproduction of the video. When thenon-reproduction portion is reproduced, the frame is reproduced withoutfail at step S164.

[0330] In addition, when the reproduction/non-reproduction informationis described as a continuous value, a threshold value is determined fromthe user profile for differentiating the reproduction and thenon-reproduction to determine the reproduction or the non-reproductiondepending on whether or not the reproduction/non-reproductioninformation exceeds the threshold value. Except for using the userprofile, for example, the threshold value is calculated from theimportance set for each frame, or information may be received in advancefrom the user as to whether the reproduction or non-reproduction isprovided in real time.

[0331] In this manner, it becomes possible to reproduce only a portionof the video by adding to the frame information thereproduction/non-reproduction information 123 for controlling whetherthe video is reproduced or not with the result that it becomes possibleto reproduce only the high-light scene or only the scene in which a manor an object of interest appears.

[0332] Next, there will be explained a describing method if the locationinformation of media (for example, text or sound) other than the videoassociated with the video to be displayed, and time for displaying orreproducing the video is added to the frame information (see FIG. 1) asadditional information.

[0333] In FIG. 8, the video location information 101 and the displaytime information 102 are included in each frame information 100. In FIG.34, the video location information 101 and importance information 122are included in each frame information 100. In FIG. 35, the videolocation information 101, the display time information 121, andimportance information 122 are included in each frame information 100.In FIGS. 44, 45, and 46, there is further shown an example in which thereproduction/non-reproduction information 123 is included in each frameinformation 100. In any example, 0 or more sound location information2703, sound reproduction time information 2704, 0 or more textinformation 2705 and text display time information 2706 (however, 1 ormore in any of the information) may be added.

[0334]FIG. 49 shows an example in which one set of sound locationinformation 2703 and sound reproduction time information 2704 and N setsof text information 2705 and text display time information 2706 areadded to an example of the data structure of FIG. 8.

[0335] The sound is reproduced for the time indicated by the soundreproduction time information 2704 from the location indicated by thesound location information 2703. An object of reproduction may be soundinformation attached to the video from the beginning. Background musicis created to be newly added.

[0336] The text displays the text information indicated by the textinformation 2705 for the time indicated by the text display timeinformation 2706. A plurality of items of text information may be addedto one video frame.

[0337] The time when the sound reproduction and the text display arestarted is the same as the time when the associated video frame isdisplayed. The sound reproduction time and the text display time are setwithin the range of the associated video frame time. If continuous soundis reproduced over a plurality of video frames, the sound locationinformation and the reproduction time may be set to be continuous.

[0338] With such a method, summarized sound and summarized text can bemade possible.

[0339]FIG. 50 shows one example of a method for describing the soundinformation separately from the frame information. This is an example ofa data structure for reproducing sound associated with the video framewhich is displayed at the time when the special reproduction isconducted. A set of the location information 2801 showing the locationof the sound to be reproduced, reproduction start time 2802 when thesound reproduction is started, and reproduction time 2803 when thereproduction is continued is set as one item of sound information 2800to be described as an arrangement of this sound information.

[0340]FIG. 51 shows a data structure for describing the textinformation. The data structure has the same structure as the soundinformation of FIG. 50, and a set of character code location information2901 of the text to be displayed, a display start time 2902, and adisplay time 2903 is set as one item of text information 2900 to bedescribed as an arrangement of this sound information. As informationcorresponding to the character code location information 2901, insteadof the character code location information 2901, the locationinformation may be used which indicates a location where the charactercode is stored, or a location where the character is stored as a video.

[0341] The above sound information or the text information issynchronized with the display of the video frame to be displayed asinformation associated with the video frame or a constant video framesection in which the displayed video frame is present. As shown in FIG.52, the reproduction or the display of the sound information or the textinformation is started with the lapse of time shown by the time axis3001. In the beginning, the video 3002 is displayed and reproduced forthe described display time in an order in which the respective videoframes are described. Reference numerals 3005, 3006 and 3007 denoterespective video frames and a predetermined display time is allocatedthereto. The sound 3003 is reproduced when the reproduction start timedescribed in each sound information comes. When the reproduction timedescribed in a similar manner has passed away, the reproduction issuspended. As shown in FIG. 52, a plurality of sounds 3008 and 3009 maybe reproduced. In a similar manner as the sound, the text 3004 is alsodisplayed when the display time described in the each of the textinformation comes. When the display time which is described has passedaway, the display is suspended. A plurality of texts 3010 and 3011 maybe displayed at the same time.

[0342] It is not required that the sound reproduction start time and thetext display start time coincides with the time at which the video frameis displayed. It is not required that the sound reproduction time andthe text display time coincides with the display time of the videoframe. These times can be freely set, on the contrary, the display timeof the video frame may be changed in accordance with the soundreproduction time and the text display time.

[0343] It is possible that these times can be manually set by man.

[0344] In order to omit the trouble of determination by man, it ispreferable to determine a phenomenon which is likely to appear in thevideo scene which seems to be important and to automatically set thesetimes. Hereinafter, several examples of automatic setting are shown.

[0345]FIG. 53 shows one example of a processing procedure in which acontinuous video frame section is determined which is referred to as ashot from a change-over of the screen up to the next change-over of thescreen, so that the total of the display time of the video framesincluded in the shot is defined as the sound reproduction time. FIG. 53is also established as a function block diagram.

[0346] At step S3101, the shot is detected from the video. For thispurpose, there are used such methods as a method for detecting a cut ofa motion picture from the MPEG bit streams using a tolerance ratiodetection method. (The transactions of the institute of electronics,information and communication engineers, Vol. J82-D-II, No. 3, pp.361-370, 1999) and the like.

[0347] At step S3102, the video frame location information is referredto thereby investigating which shot respective video frames belong to.Furthermore, the display times of respective shots are calculated bytaking the total of the display times of the video frames.

[0348] For example, the sound location information is set as the soundlocation corresponding to the start of the shot. The sound reproductionstart time may be allowed to coincide with the display time of theinitial video frame which belongs to each shot while the soundreproduction time may be set to be equal to the display time of theshot. Otherwise, in accordance with the reproduction time of the sound,the display time of the video frames included in each shot may becorrected. Although the shot is detected here, if a data structure istaken wherein the importance information is described in the frameinformation, the section having importance exceeding the threshold valueis determined by using the importance with respect to the video frame sothat the sound included in the section may be reproduced.

[0349] If the determined reproduction time does not meet a constantreference, the sound may not be reproduced.

[0350]FIG. 54 shows one example of a processing procedure in whichimportant words are taken out from sound data corresponding to the shotor the video frame section having the high importance with soundrecognition so that the words, or the sound including the words or thesound in which a plurality of words are combined are reproduced. FIG. 54is also established as a function block diagram.

[0351] At step S3201, the shot is detected. In place of the shot, thevideo frame section having the high importance is calculated.

[0352] At step S3202, the sound recognition is carried out with respectto the sound data section corresponding to the obtained video framesection.

[0353] At step S3203, sounds including the important word portion orsounds of the important word portion are determined from the recognitionresult. In order to select the important words, an important worddictionary 3204 is referred to.

[0354] At step S3205, the sound for reproduction is created. Continuoussounds including the important words may be used as they are. Onlyimportant words may be extracted. Sounds having a combination of aplurality of important words may be created.

[0355] At step S3206, in accordance with the reproduction time of thecreated time, the display time of the video frame is corrected. However,the number of selected words may be decreased and the reproduction timeof the sound may be shortened so that the sound reproduction time is setto be within the display time of the video frame.

[0356]FIG. 55 shows one example of a procedure in which text informationis obtained from the telop. FIG. 55 is also established as a functionblock diagram.

[0357] In the processing of FIG. 55, the text information is obtainedfrom the telop or the sound displayed in the video.

[0358] At step S3301, the telop displayed in the video is read. Thisincludes a method in which the telop in the original video isautomatically extracted or the telop is read by man to be manually inputwith a method or the like described in, for example, a method describedin a literature such as “A method for extracting the character portionfrom the video for the telop region” by Osamu Hori, CVIMI 114-17, pp.129-136(1999).

[0359] A step S3302, important words are taken out from the telopcharacter string which has been read. In the judgment of importantwords, an important word dictionary 3303 is used. The telop characterstring which is read may be text information as it is. Extracted wordsare arranged, and a sentence representing the video frame section may beconstituted with only the important words to provide text information.

[0360]FIG. 56 shows one example for obtaining the text information fromthe sound. FIG. 56 is also established as a function block diagram.

[0361] In the sound recognition processing at step S3401, sound isrecognized.

[0362] At step S3402, important words are taken out from the recognizedsound data. In the judgment of important words, an important worddictionary 3403 is used. The recognized sound data may be used as testinformation. Extracted words are arranged, and a sentence is constitutedwhich represents the video frame section with only the important wordsto provide text information.

[0363]FIG. 57 shows an example of processing procedure for taking outtext information and preparing the text information with teloprecognition from the shot or from the video frame section having highimportance. FIG. 57 is also established as a function block diagram.

[0364] At step S3501, the shot is detected from the video. Instead ofthe shot, the section having high importance may be determined.

[0365] At step S3502, the telop represented in the video frame sectionis recognized.

[0366] At step S3503, the important words are extracted by using animportant word dictionary 3504.

[0367] At step S3505, text for the display is created. For this purpose,a telop character string including important words may be used. Onlyimportant words or a character string using the important words may beused as text information. If text information is obtained by soundrecognition, the telop recognition processing at step S3502 is subjectedto sound recognition processing to input sound data. The textinformation is displayed together with the video frame in which the textis displayed as telop or video frame of the time at which the data isreproduced as sound. Otherwise, text information in the video framesection may be displayed at one time.

[0368]FIGS. 58A and 58B are views showing a display example of the textinformation. As shown in FIG. 58A, the display may be divided into thetext information display area 3601 and the video display area 3602. Asshown in FIG. 58B, the text information may be overlapped with the videodisplay area 3603.

[0369] Respective display times (reproduction times) of the video frame,the sound information and the text information may be adjusted so thatall the media information is synchronized. For example, at the time ofthe double speed reproduction of the video, important sounds areextracted by the above method, and a half time sound information of thenormal reproduction is obtained. Next, the display time is allocated tothe video frame associated with respective sounds. If the display timeof the video frame is determined so that the scene change quantitybecomes constant, the sound reproduction time or the text display timeis set to be within the display time of the respectively associatedvideo frames. Otherwise, a section including a plurality of video framesis determined like the shot, so that the sound or the text included inthe section is determined or displayed in accordance with the displaytime of the section.

[0370] So far there has been explained video data as its main focus.However, the data structure of the present invention can be modified toa data having no frame information, i.e., the sound data. It is possibleto use sound information and text information in the form without theframe information. In this case, a summary is created which comprisesonly sound information or text information with respect to the originalvideo data. In addition, a summary can be created which comprises onlysound information and text information with respect to the sound dataand music data.

[0371] Though the data structures shown in FIGS. 50 and 51 are used todescribe the sound information and text information in synchronizationwith the video data, it is possible to summarize the sound data and textdata only. To summarize the sound data, the data structure shown in FIG.50 can be used irrespective of the video information. To summarize thetext data, the data structure shown in FIG. 51 can be used irrespectiveof the video information. At that time, in the same manner as in thecase of the frame information, the original data information may beadded to describe a correspondence relationship between the originalsound and music data to the sound information and text information.

[0372]FIG. 59 shows an example of a data structure in which the originaldata information 4901 is included in the sound information shown in FIG.50. If the original data is the video, the original data information4901 indicates the section of video frames (start point information 4902and section length information 4903).

[0373] If the original data is sound data and music data, the originaldata information 4901 indicates the section of sound and music.

[0374]FIG. 60 shows an example of a data structure in which the originaldata information 4901 is included in the sound information shown in FIG.30.

[0375]FIG. 61 explains an example in which sound/music is summarized byusing the sound information. The original sound/music is divided intoseveral sections. A portion of the section is extracted as thesummarized sound/music so that the summary of the original data iscreated. For example, a portion 5001 of the section 2 is extracted assummarized sound/music to be reproduced as a section 5002 of thesummary. As an example of a method for dividing the section, the musicmay be divided into chapters and the conversation may be divided by thecontents.

[0376] Furthermore, in the same manner as in the case of the frameinformation, the description of the original data file and the sectionare included in the sound information and the text information with theresult that a plurality of sound/music data items can be summarizedtogether. At this time, if identification information is added to theindividual original data, the original data identification informationmay be described in place of the original data file and the section.

[0377]FIG. 62 explains an example in which sound/music is summarized byusing the sound information. Portions of plural sound/music data itemsare extracted as the summarized sound/music so that the summary of theoriginal data is created. For example, a portion 5001 of the sound/musicdata item 2 is extracted as summarized sound/music to be reproduced as asection 5102 of the summary. A piece of music included in one musicalbum is extracted by a portion of the section, so that a summarizeddata for trial can be created as a usage.

[0378] If an album is summarized, the title of the music may be includedin the music information when it is preferable that the title of themusic can be known. This information is not indispensable.

[0379] Next, a method of providing video data will be explained.

[0380] If the special reproduction control information created in theprocessing of the embodiment is provided for the use, it is necessary toprovide the special reproduction control information from the side ofthose who create the information to the side of the user with somemeans. As this method of providing the special reproduction controlinformation, various forms can be considered as exemplified below:

[0381] (1) Video data and special reproduction control information arerecorded on one (or a plurality of) recording medium (or media) andprovided at the same time;

[0382] (2) Video data is recorded on one (or a plurality of) recordingmedium (or media) and provided, and the special reproduction controlinformation is separately recorded on one (or a plurality of) recordingmedium (media) and provided;

[0383] (3) Video data and the special reproduction control informationare provided via the communication medium at the same occasion;

[0384] (4) Video data and the special reproduction control informationare provided via the communication media at different occasions.

[0385] According to the above described embodiments, a specialreproduction control information describing method for describingspecial reproduction control information provided for specialreproduction with respect to the video contents describes, as the frameinformation, for each of frames or groups of continuous or adjacentframes selectively extracted from the whole frame series of video dataconstituting the video contents, first information showing a location atwhich video data of the one frame or one group is present and secondinformation associated with display time allocated to the one frame orthe frame group, and/or third information showing importance allocatedto the one frame or the frame group corresponding to the frameinformation.

[0386] According to the above described embodiments, a computer readablerecording medium storing a special reproduction control informationstores at least frame information described for each of frames or groupsof continuous or adjacent frames selectively extracted from the wholeframe series of video data constituting the video contents, the frameinformation comprising first information showing a location at whichvideo data of the one frame or one group is present and secondinformation associated with display time allocated to the one frame orthe frame group, and/or third information showing importance allocatedto the one frame or the frame group corresponding to the frameinformation.

[0387] According to the above described embodiments, a specialreproduction control information describing apparatus/method fordescribing special reproduction control information provided for specialreproduction with respect to the video contents describes, as the frameinformation, for each of frames or groups of continuous or adjacentframes selectively extracted from the whole frame series of video dataconstituting the video contents, video location information showing alocation at which video data of the one frame or one group is presentand display time control information including display time informationand basic information based on which the display time is calculated, tobe allocated to the one frame or the frame group.

[0388] According to the above described embodiments, a specialreproduction apparatus/method which enables a special reproduction withrespect to video contents, wherein special reproduction controlinformation is referred to which includes at least frame informationincluding video location information showing a location at which oneframe data or one frame group data is present which information isdescribed for each of the frame groups comprising one frame selectivelyextracted out of the whole frame series of the video data allocated tothe video contents and constituting the video contents or a plurality ofcontinuous or adjacent frames; the one frame data or the frame groupdata corresponding to each frame information is obtained on the basis ofvideo location information included in the frame information while thedisplay time which should be allocated to each frame information isdetermined on the basis of display time control information included inat least each frame information and data on the one frame or theplurality of frames which is or are obtained is reproduced at thedetermined display time in a predetermined order thereby carrying out aspecial reproduction.

[0389] In the above described embodiments, for example, image data iscreated in advance, which is extracted in frame units from locationinformation on an effective video frame or an original video which isused for display, and the video frame location information orinformation on the display time of the image data is created separatelyfrom the original video. Either video frames or the image data extractedfrom the original video is continuously displayed on the basis of thedisplay information so that a special reproduction such as a doublespeed reproduction, a trick reproduction, jump continuous reproductionor the like is enabled.

[0390] In the double speed reproduction for confirming the contents at ahigh speed, display time is determined in advance in such a manner thatthe display time is extended at a location where a motion of the sceneis large while the display time is shortened at a location where themotion is small so that the change in the display screen becomesconstant as much as possible. Alternatively, the same effect can beobtained even when the location information is determined so that aninterval of the extracted location is made small at a location where amotion of the video frame or video data used for the display is largewhile the interval is made small at a location where the motion islarge. A reproduction speed control value may be created so that adouble speed value or a reproduction time is provided which isdesignated by a user as a whole. A long video can be viewed at doublespeed reproduction, so that the video can be easily viewed in a shorttime, and the contents can be grasped in a short time.

[0391] It is possible to reproduce videos so that important locationsare not overlooked by extending the display time at the importantlocations and shortening the display time at unimportant locations inaccordance with the importance of the video.

[0392] Only important locations may be efficiently reproduced bypartially omitting a part of the video without displaying the wholevideo frame.

[0393] According to embodiments of the present invention, an effectivespecial reproduction is enabled on the basis of the control informationon the reproduction side by arranging and describing as controlinformation provided for a special reproduction of the video contents aplurality of frame information including a method for obtaining a frameor a group of frames selectively extracted from the original video,information on the display time (absolute or relative value) allocatedto the frame or the group of frames and information which forms thebasis for obtaining the information on the display time.

[0394] Additional objects and advantages of the invention will be setforth in the description which follows, and in part will be obvious fromthe description, or may be learned by practice of the invention. Theobjects and advantages of the invention may be realized and obtained bymeans of the instrumentalities and combinations particularly pointed outhereinafter. For example, each of the above functions can be realized assoftware. The above embodiments can be realized as a computer readablerecording medium on which a program is recorded for allowing thecomputer to conduct predetermined means or for allowing the computer tofunction as predetermined means, or for allowing the computer to realizea predetermined function.

[0395] The structures shown in each of the embodiments are one example,and are not intended to exclude other structures. It is also possible toprovide a structure which is obtained by replacing a part of thestructure exemplified above with another structure, omitting a part ofthe exemplified structure, adding a different function to theexemplified structure, and combining such measures. A differentstructure logically equivalent to the exemplified structure, a differentstructure including a part logically equivalent to the exemplifiedstructure, and a different structure logically equivalent to theessential portion of the exemplified structure can be provided. Anotherstructure identical to or similar to the exemplified structure, or adifferent structure having the same effect as the exemplified structureor a similar effect can be provided.

[0396] In each of the embodiments, various variations with respect tovarious structure components can be put into practice in an appropriatecombination.

[0397] Each of the embodiments includes or inherently contains aninvention associated with various viewpoints, stages, concept or acategory such as, for example, an invention as a method for describinginformation, an invention as information which is described, aninvention as an apparatus or a method corresponding thereto, aninvention as an inside of the apparatus or a method correspondingthereto.

[0398] Consequently, the invention can be extracted without beinglimited to the exemplified structure from the content disclosed in theembodiment according to this invention.

What is claimed is:
 1. A method of describing frame information, the method comprising: describing, for a frame extracted from a plurality of frames in a source video data, first information specifying a location of the extracted frame in the source video data; and describing, for the extracted frame, second information relating to a display time of the extracted frame.
 2. The method according to claim 1, wherein the extracted frame comprises a group of frames, and the first information comprises information specifying a location of the extracted group of frames in the source video data.
 3. The method according to claim 1, further comprising describing, for the extracted frame, third information relating to importance of the extracted frame.
 4. The method according to claim 1, wherein the first information comprises information specifying an image data file created from the video data of the extracted frame.
 5. The method according to claim 1, wherein the extracted frame comprises a frame extracted from a plurality of frames included in a temporal section of the source video data, and further describing fourth information specifying the temporal section of the source video data.
 6. The method according to claim 5, wherein the first information comprises information specifying an image data file created from the source video data of the extracted frame, the image data corresponding to the extracted frame.
 7. The method according to claim 1, wherein the second information comprises information relating to such display time that a frame activity value during a special reproduction is kept substantially constant.
 8. The method according to claim 1, further comprising describing fifth information indicating whether the extracted frame is reproduced or not.
 9. The method according to claim 1, wherein the first information comprises one of information specifying a location of the extracted frame among the plurality of frames and information specifying a location of image data within an image data file created from the source video data and stored separately from the video data, the image data corresponding to the extracted frame.
 10. The method according to claim 1, further comprising describing, for media data other than the source video data including the extracted frame, information specifying a location of the media data and information relating to a display time of the media data.
 11. An article of manufacture comprising a computer usable medium storing frame information, the frame information comprising: first information, described for a frame extracted from a plurality of frames, specifying a location of the extracted frame in the source video data; and second information, described for the extracted frame, relating to a display time of the extracted frame.
 12. The article of manufacture according to claim 11, wherein the extracted frame comprises a group of frames, and the first information comprises information specifying a location of the extracted group of frames in the source video data.
 13. The article of manufacture according to claim 11, wherein the frame information comprises third information relating to importance of the extracted frame.
 14. The article of manufacture according to claim 11, wherein the first information comprises information specifying an image data file created from the video data of the extracted frame.
 15. The article of manufacture according to claim 11, further storing the source video data and an image data file corresponding to the source video data of the extracted frame in addition to the frame information.
 16. An apparatus for creating frame information, the apparatus comprising: a unit configured to extract a frame from a plurality of frames in a source video data; a unit configured to create the frame information including first information specifying a location of the extracted frame and second information relating to a display time of the extracted frame; and a unit configured to link the extracted frame to the frame information.
 17. A method of creating frame information, the method comprising: extracting a frame from a plurality of frames in a source video data; and creating the frame information including first information specifying a location of the extracted frame in the source video data and second information relating to a display time of the extracted frame.
 18. An apparatus for performing a special reproduction, comprising: a unit configured to refer to frame information described for a frame extracted from a plurality of frames in a source video data and including first information specifying a location of the extracted frame in the source video data and second information relating to a display time of the extracted frame; a unit configured to obtain the video data corresponding to the extracted frame based on the first information; a unit configured to determine the display time of the extracted frame based on the second information; and a unit configured to display the obtained video data for the determined display time.
 19. A method of performing a special reproduction comprising: referring to frame information described for a frame extracted from a plurality of frames in a source video data and including first information specifying a location of the extracted frame and second information relating to a display time of the extracted frame; obtaining the video data corresponding to the extracted frame based on the first information; determining the display time of the extracted frame based on the second information; and displaying the obtained video data for the determined display time.
 20. An article of manufacture comprising a computer usable medium having computer readable program code means embodied therein, the computer readable program code means performing a special reproduction, the computer readable program code means comprising: computer readable program code means for causing a computer to refer to frame information described for a frame extracted from a plurality of frames in a source video data and including first information specifying a location of the extracted frame and second information relating to a display time of the extracted frame; computer readable program code means for causing a computer to obtain the video data corresponding to the extracted frame based on the first information; computer readable program code means for causing a computer to determine the display time of the extracted frame based on the second information; and computer readable program code means for causing a computer to display the obtained video data for the determined display time.
 21. A method of describing sound information, the method comprising: describing, for a frame extracted from a plurality of sound frames in a source sound data, first information specifying a location of the extracted frame in the source sound data; and describing, for the extracted frame, second information relating to a reproduction start time and reproduction time of the sound data of the extracted frame.
 22. An article of manufacture comprising a computer usable medium storing frame information, the frame information comprising: first information, described for a frame extracted from a plurality of sound frames, specifying a location of the extracted frame in the source sound data; and second information, described for the extracted frame, relating to a reproduction start time and reproduction time of the sound data of the extracted frame.
 23. A method of describing text information, the method comprising: describing, for a frame extracted from a plurality of text frames in a source text data, first information specifying a location of the extracted frame in the source text data; and describing, for the extracted frame, second information relating to a display start time and display time of the text data of the extracted frame.
 24. An article of manufacture comprising a computer usable medium storing frame information, the frame information comprising: first information, described for a frame extracted from a plurality of text frames in a source text data, specifying a location of the extracted frame in the source text data; and second information, described for the extracted frame, relating to a display start time and display time of the text data of the extracted frame. 